Transormation pipeline for Train dataset

  1. Read audio wave from filepath
    1. Read wav file (tf.io.read_file)
    2. Decode wav file (tf.audio.decode_wav)
  2. Remove silence from the begining and the end (tfio.audio.trim) (OPTIONAL)
  3. Limit audio to a fixed number of seconds
    • Sorter audio –> Pad the end with zeros
    • Longer audio –> Random crop
  4. Data augmentation over audio wave
    • Change Speed
    • Pink noise
    • Gaussian noise
    • Gaussian SNR
    • Gain (Volume Adjustment)
  5. Convert audio to MelSpectogram
    1. Convert audio to spectogram (tfio.audio.spectrogram)
    2. Apply the Mel scale (tfio.audio.melscale)
    3. Apply the DB scale (tfio.audio.dbscale)
  6. Data augmentation over MelSpectogram
    • Time Warping (tfa.image.sparse_image_warp) (from the SpecAugment paper)
    • Time Masking (tfio.audio.time_mask) (from the SpecAugment paper)
    • Frequency Masking (tfio.audio.freq_mask) (from the SpecAugment paper)
    • Mixup
    • Any other image transformation
  7. Add the coordconv channel (OPTIONAL)
  8. Normalize (standard scale)
    • Apply the correct mean and std if transfer learning

Past kaggle audio competitions